DOCUMENT RESUME 



ED 453 655 



FL 026 714 



AUTHOR 

TITLE 

INSTITUTION 
PUB DATE 
NOTE 

PUB TYPE 

JOURNAL CIT 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Nakamura, Yu j i 

Rasch Based Analysis of Oral Proficiency Test Data. 
International Christian Univ. , Tokyo (Japan) . 

2001-03-00 

9p. 

Journal Articles (080) -- Numerical/Quantitative Data (110) 

-- Reports - Descriptive (141) 

Educational Studies; v43 pl91-197 March 2001 
MF01/PC01 Plus Postage. 

Communicative Competence (Languages) ; Dialogs (Language) ; 
Elementary Secondary Education; * Factor Analysis; Foreign 
Countries; Language Proficiency; Monologs; *Oral Language; 
Rating Scales; Second Language Instruction; Second Language 
Learning; Student Evaluation; +Test Format; Testing 
Japan; *Oral Proficiency Testing; *Rasch Model 



ABSTRACT 



This paper examines the rating scale data of oral 
proficiency tests analyzed by a Rasch Analysis focusing on an item map and 
factor analysis. In discussing the item map, the difficulty order of six 
items and students' answering patterns are analyzed using descriptive 
statistics and measures of central tendency of test scores. The data ranks 
the items from easiest to most difficult. The factor analysis shows that one 
factor should be person- related, and the other should be linguistics - related . 
The result of the Rasch analysis suggests that there needs to be three tests 
to make a more precise measurement of the students ' communicative language 
ability. The first is a test for monologue ability, the second for multilogue 
ability, and the third for dialogue ability. To truly understand the 
students' language ability it must be examined from several different 
viewpoints. (Contains an abstract in Japanese.) (KFT) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 





ED 453 655 




Educational 

Studies 

43 

March 2001 



Rasch Based Analysis of Oral Proficiency Test Data 

jJ|/P 

NAKAMURA, Yuji 



permission TO reproduce and 
disseminate this material has 

BEEN GRANTED BY 



(K^ 






TO THE EDUCATIONAL RESOURCES 

INFORMATION CENTER (ERIC) 



originating it, 

□ Minor changes have been made to 

improve reproduction quality. 



official OERI position or policy. 



a 

International Christian University 



l 



2 



BEST COPY AVAILABLE 



WZZtmlZ Articles 



□si 

Rasch Based Analysis of Oral Proficiency Test Data 

tptt W/u NAKAMURA, Yuji 

• Tokyo Keizai University 



lywords 



□D0WIB*^h. Zvi/attl/. lie®, 

Oral Proficiency Test, Rasch Model, Item Map, Factor Analysis 



ABSTRACT 

* F 31 S O5HT0£> i X <0 □ i 1 *W«ilcfir£ 

-6. 4>tt (1999) CO 2 fc 1 1C, ^®W0f^-CI±t|@£.^a^<O£^«±T-*4 unidimensionality, 

J3 i If local independence (Offll&frbfrytfctlOz., JrnlfaW: 3 Srist LT Mono- 
logue, Dialogue, Multilogue CO 3 •£ ^) J , 4HLL LT Oral Proficiency L U' o 

1 ocof^i-rl^LTV'^ t#xP>tL£. 



3 



Educational Studies 43 191 
International Christian University 




Rasch Based Analysis of Oral Proficiency 
Test Data 

This paper examines the rating scale data of 
Oral Proficiency Tests, which has already been 
analyzed through raw scores, from the view- 
point of the Rasch analysis focusing on two 
things: 1) item map and 2) factor analysis. 

First, we will discuss the item map. Nakamura 
(1999) argues about the difficulty order of 6 
items and the students* answering patterns 
using descriptive statistics and the distributions 
of frequency of students’ test scores. The data 
shows that TMS is the easiest item followed 
by GI, SO, SGD, FFI, and LGD, which is the 
most difficult one. The data also shows that 
the lowest point is rarely used except in LGD, 
which automatically tells us that LGD is the 
difficult item. All the information is given 
without a device of item map. 

N. B. Abbreviation Used: 

SO: Speech Making Overall 

TMS: Tape Mediated Sociolinguistic Test 

FFI: Face to Face Interview 



GI: Group Interview 

SGD: Small Group Discussion Test 

LGD: Large Group Discussion Test 

Table 1 is an item map prepared by the 
Rasch measured scores in the present research. 
Table 1 shows us another way of looking at 
the relationship between person and item. 
The figure shows the position of each of the 
four rating categories (very good: 4, good: 3, 
fair: 2, poor: 1) for the six items arranged in 
order of difficulty. This type of map can be 
used, as Bode and Wright (1999) claim, to 
develop a quick-scoring method that takes the 
difficulty level of individual categories and 
items into account. Here, we can observe the 
fact that, in LGD, students were rated in a 
good balance by being given the lowest point 
to a certain number, while in GI and TMS, 
very few students were given the lowest point. 
In other words, LGD turned out to be the 
most difficult item. Compared with Nakamura 
(1999), this item map information can be 
quicker and more comprehensive with an 
easier data handling process. 



Table 1 

INPUT: 46 STUDENTS, 6 ITEMS ANALYZED 45 STUDENTS, 6 ITEMS 4 CATS WINSTEPS v2.85 



EXPECTED 
0 10 


SCORE : 
20 


MEAN 

30 


r 

40 


" INDICATES HALF-SCORE POINT! 
50 60 70 80 90 


100 




ITEM 


i 


■ 








■ 


1 1 


■ 


■ 






1 






1 


: 


0 ; 


: 3 




: 


44 


6 


LGD 


1 

1 




1 




; 


© 


3 


: 




1 

4 4 


3 


FFI 


1 

1 




1 




2 


: 


Q 


: 


4 


1 

4 


5 


SGD 


1 

1 




1 : 




2 




Q 


4 




1 

4 


1 


SO 


1 

1 


1 


: 


2 


: 


0 


: 


4 




1 

4 


4 


GI 


1 


1 


: 


2 


: 


© 


: 


4 




4 


2 


TMS 


1 — 


+-. 


+__ 


+_ 


+. 




— + + 


■— + 


+_ 


| 


NUM 


ITEM 


0 


10 


20 


30 


40 




60 70 


80 


90 


100 












1 


3 2 


3 5 6 


2 5 6 6 


2 3 




1 1 


STUDENT 








Q 




S 


M S 




Q 









192 Educational Studies 43 

International Christian University 



& 




Bode and Wright (1999) state further that: 
“One can record item responses on such a 
map, determine the approximate average 
horizontal position by eye, and draw a vertical 
line down to the expected score at the bottom 
to estimate the overall measure. Unexpected 
responses which digress from the vertical line 
are easily spotted and can be used 
diagnostically. The empty spaces between 
items indicate a signifi-cant difference in their 
difficulty — a difference greater than two 
standard errors of their calib-ration 
estimates.” (p. 309) 

This type of item map can also be used to 
describe the frequency of activities reported 
by an individual student (cf. Bode and Wright 
1999). Let us look at an example of the location 
of category labels vertically above a measure 
of 50. 

Consider a student with a measure of 50, 
who is expected to get 16 points in total from 
the 6 test items as indicated in the circled 
numbers in Table 1. This student is rated 
above the middle point (2.5 in the 1-4 scale) 



in TMS, GI, SO and SGD, while he / she is 
rated below the middle point in FFI and LGD. 
In GI and TMS, this student is rated closer to 

3 points, whereas in LGD he / she is rated 
almost on 2 points. Overall, this average 
student is expected to get lower points than 
the middle point in LGD and FFI, which 
seems to explain the difficulty level of these 
items. However, if this student gets only one 
point in TMS, there is something wrong with 
this student. We should diagnostically 
investigate it immediately. 

Let us take a look at another example. This 
time, consider a student with a measure of 80 
(in Table 1-a). This student is expected to get 

4 points in TMS, GI, SO and SGD, and 3 points 
in FFI and LGD. If this student gets only 2 
points in LGD, something is wrong with this 
student or the item. We should diagnostically 
examine the reason which has caused the 
misfitting case. Thus , this item map can help 
to locate the misfitting part quite quickly. 



Table 1-a 

INPUT: 46 STUDENTS, 6 ITEMS ANALYZED 45 STUDENTS, 6 ITEMS 4 CATS WINSTEPS v2.85 



EXPECTED SCORE: MEAN 
0 10 20 30 

1 1 
I 

1 1 
I 

1 1 : 

I 

1 1 : 

I 

1 1:2 
11:2 
I + + +- 

0 10 20 30 

1 

Q 



INDICATES HALF-SCORE POINT) 

40 50 60 70 80 90 

+ + + + + 

= 2 @ 

2 : © : 4 

2 : 3 : © 

2 : 3 : <2> 




32356256 62 3 1 

S M S Q 



100 
— I 
44 
I 

4 

I 

4 

I 

4 

I 

4 

4 

100 

1 



NUM ITEM 
6 LGD 

3 FFI 
5 SGD 

1 SO 

4 GI 

2 TMS 
NUM ITEM 



STUDENT 



5 



Educational Studies 43 193 

Internationel Christian University 




Table 1-b 

INPUT: 46 STUDENTS, 6 ITEMS ANALYZED 45 STUDENTS, 6 ITEMS 4 CATS WINSTEPS v2.85 



EXPECTED SCORE: MEAN 
0 10 20 30 



1 

1 

— +• 
10 



© 



Q 



20 



40 



Q 



Q 



0 

o 



INDICATES HALF- SCORE POINT) 
50 60 70 80 90 



-+■ 

40 



— +■ 
50 



■-+- 

60 



70 



4 

4 

-+■ 

80 



-+- 

90 



100 



I NUM 



44 

I 

4 



100 



6 



4 

2 

NUM 



ITEM 

LGD 

FFI 

SGD 

SO 

GI 

TMS 

ITEM 



1 32356256 62 3 1 1 STUDENT 

Q S M S Q 



Table 2 

FACTOR 1 FROM PRINCIPAL COMPONENT ANALYSIS OF STANDARDIZED RESIDUAL COR- RELATIONS FOR ITEMS 
(SORTED BY LOADING) FACTOR 1 EXPLAINS 1.83 OF 6 VARIANCE UNITS 









INFIT 


OUTFIT 


ENTRY 




FACTOR 


LOADING 


MEASURE 


MNSQ 


MNSQ 


NUMBER 


ITEM 


1 


.78 


41.8 


.91 


1.01 


B 


2 TMS 


1 


.74 


47.1 


.83 


.80 


c 


ISO 


1 


.02 


56.6 


.70 


.71 


b 


3FFI 


1 


-.59 


50.8 


.66 


.66 


a 


5 SGD 


1 


-.55 


61.0 


1.71 


1.68 


A 


6 LGD 


1 


-.13 


42.5 


.97 


.98 


C 


4 GI 



Let us look at yet another example. This 
time, consider a student with a measure of 30 
(in Table 1-b). This student is expected to 
obtain 10 points in total. If this student gets 4 
points in FFI, there is something wrong with 
this student or the item. We should 
diagnostically analyze the reason for this 
misfitting case. 

Secondly, we will talk about the factor analy- 
sis. Nakamura (1999) states that two factors 
were obtained through a factor analysis using 
raw scores from the test results. Fie suggested 

194 1 Educational Studies 43 

| International Christian University 



that one factor should include the functioning 
of the number of people involved in the oral 
language activities, while the other is related 
to a linguistic element. 

The present research will investigate the 
factors from another viewpoint by employing 
the Rasch analysis and explore the details of 
the factors. 

Tables 2, 3, and 4 below show that we are 
able to obtain 3 factors. The first factor is 
composed of two tests: TMS and SO. The 
second factor consists of LGD, and the third 

6 





factor is contributed by GI. The rest of the 
items (tests) were not involved in the 3 main 
factors; however, SGD and LGD strongly 
show their opposite contribution to the first 
factor, which indicates that the number of 
people in the test is important. Furthermore, 
FFI demonstrates an extremely strong reverse 
contribution to the second factor, which also 
suggests the significance in the number of 
people involved 

Table 2 shows Factor 1. This factor can be 
called Monologue Ability because in SO and 
TMS, students are speaking to the tape by 
themselves in the language laboratory, even 
though there is a semi-direct interaction 
between a student and the stimulus (which is 
heard from the recorded tape). As mentioned 
above, SGD and LGD are making contributions 
to Factor 1 in the opposite directions. This 
indicates that there is an important element 
in this factor (Monologue Ability), which 
distinguishes between Monologue Ability and 
Non-monologue ability. Furthermore, this 
Monologue Ability suggests that one’s Mono- 
logue Ability is different from ones Non- 
monologue ability. 

Table 3 demonstrates the second factor. This 



factor, Factor 2, is made only of LGD, and 
can be named Multilogue Ability, because in 
LGD a student needs to demonstrate his / her 
discussion ability in a large class sized group 
(more than 10 people involved). Though TMS 
is included, it can be ignored as a contributing 
element to this factor, due to the small factor 
loading (below .30). Actually, TMS has already 
contributed to Factor 1 (Monologue Ability). 

What should be noticed in Factor 2 is that 
FFI and SGD (especially FFI) are contributing 
to this factor in the opposite directions. This 
indicates that there is an important element 
which is distinguishing between LGD and FFI 
and SGD, which could be due to the number 
of people involved. 

Table 4 shows the third factor, Factor 3. This 
factor is supported by GI, and can be called 
Dialogue Ability because a student should 
respond to questions asked by an interviewer 
(interviewers), even though the situation is 
not face to face nor one on one. 

Although Nakamura (1999) started with six 
different tests to evaluate students’ communi- 
cative language ability, and ended up with two 
factors (a person related factor and a linguistic 
factor), the result of the Rasch based analysis 



Table 3 

FACTOR 2 FROM PRINCIPAL COMPONENT ANALYSIS OF STANDARDIZED RESIDUAL COR- RELATIONS FOR ITEMS 
(SORTED BY LOADING) FACTOR 2 EXPLAINS 1.57 OF 6 VARIANCE UNITS 









INFIT 


OUTFIT 


ENTRY 




FACTOR 


LOADING 


MEASURE 


MNSQ 


MNSQ 


NUMBER 


ITEM 


2 


.79 




. L71 


1.68 


A 


6 LGD 


2 


.26 


41.8 


.91 


1.01 


B 


2 TMS 


2 


-.76 


56.6 


.70 


.71 


b 


3 FFI 


2 


-.49 


50.8 


.66 


.66 


a 


5 SGD 


2 


-.24 


42.5 


.97 


.98 


C 


4 GI 


2 


-.09 


47.1 


.83 


.80 


c 


ISO 



Educational Studies 43 195 

International Christian University 



7 

















Table 4 

FACTOR 3 FROM PRINCIPAL COMPONENT ANALYSIS OF STANDARDIZED RESIDUAL COR-RELATIONS FOR ITEMS 
(SORTED BY LOADING) FACTOR 3 EXPLAINS 1.14 OF 6 VARIANCE UNITS 



FACTOR 


LOADING 


MEASURE 


INFIT 

MNSQ 


OUTFIT 

MNSQ 


ENTRY 

NUMBER 


ITEM 


3 


.96 


42.5 


.97 


.98 


C 


4 GI 


3 


-.34 


56.6 


.70 


.71 


b 


3 FFI 


3 


-.26 


50.8 


.66 


.66, 


a 


5 SGD 


3 


-.16 


61.0 


1.71 


1.68 


A 


6 LGD 


3 


-.12 


41.8 


.91 


1.01 


B 


2 TMS 


3 


-.01 


47.1 


.83 


.80 


c 


ISO 



suggests that we need three tests to make a 
more precise measurement of students’ com- 
municative language ability: the first is a test 
for Monologue Ability, the second is a test for 
Multilogue Ability, and the third one is a test 
for Dialogue Ability. 

Putting it another way, we tend to think 
that one test will give enough information to 
understand students’ speaking ability usually 
from practical reasons. However, the present 
Rasch analysis result indicates that we need to 
look at their communication ability from 
multidimensional viewpoints such as Mono- 
logue, Dialogue and Multilogue, so that we 
can conduct a more accurate measurement. 

Through TMS or SO tests we can measure 
Monologue Ability in which students express 
their basic speaking ability on tape. By using 
a GI test, we can assess Dialogue Ability in 
which students interact with the live interviewer 
in a small group. Through the method of a 
LGD test, though it is a unique aspect of 
speaking ability, we can measure Multilogue 
Ability where students perform using their 
ability of argumentation, discussion and 
debating. 

Granted, the ideal theoretical construct 

196 I Educational Studies 43 
| International Christian University 



through the Rasch analysis should be more or 
less unidimensional, we were, in practice, able 
to obtain three factors that give us different 
views or angles at which to look at communi- 
cation ability In other words, we can view 
the communicative language ability as multidi- 
mensional ( Monologue, Dialogue and Multi- 
logue) from the practical viewpoints at each 
individual stage, though each dimension makes 
a great contribution on its own to construct 
the whole unidimensional communicative 
language ability. 

In conclusion, we have analyzed Oral Profi- 
ciency Tests Data using the Rasch measured 
scores by focusing on two things: 1) item map, 
and ) factor analysis. We have been able to 
utilize the idea of an item map in order to spot 
the level of difficulty of items, and general view 
of students’ expected responses. 

The results of the factor analysis have pro- 
vided information for the existence of three 
factors, which in practice are necessary to 
measure students’ Communicative Language 
Ability more accurately. However, as a whole 
construct of the language ability, the unidi- 
mensional view of Communicative Language 
Ability has also been proposed. 

8 





Acknowledgement 



I am grateful to Dr. Benjamin D. Wright and Dr. John M. Linacre for their invaluable comments. 



Bibliography 



Bode, R. K., and Wright, B. D. (1999). Rasch measurement in higher education. Reprinted from Higher Education: 
Handbook of Theory and Research, Vol. XIV (pp. 287-316). New York: Agathon Press. 

Linacre, J. M. (1989, 1993, 1994). Many-Facet: Rasch Measurement. Chicago: MESA Press. 

Nakamura, Y. (1999). Measuring speaking skills through multidimensional perfor-mance tests. Educational Studies, 
41 99-113. Tokyo: International Christian University. 

Rasch, G. (1960, 1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, and 
Chicago: University of Chicago Press. 

Wright, B. D. (1997). A history of social science. Educational Measurement: Issues and Practice. Winter 1997, 33- 
45 & 52. 

Wright, B. D. (1997). Fundamental measurement for psychology. In S. Emretson, and S. Hershberger (Eds.). The 
New Roles of Measurement: What Every Psychologist and Educator Should Know (pp. 65-104). Hillsdale NJ: 
Lawrence Erlbaum Associates. 

Wright, B. D. and Linacre, J. M. (1998). A User's Guide to WINSTEPS / BIGSTEPS: Rasch-Model Computer 
Programs. Chicago: MESA Press. 

Wright, B. D„ and Masters, G. N. (1982). Rating Scale Analysis: Rasch Measurement. Chicago: MESA Press. 

Wright, B. D, and Stone, M. H. (1979). Best Test Design: Rasch Measure-ment Chicago: MESA Press. 



9 



Educational Studies 43 197 
International Christian University 




£jL b^b 7/^ 




U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) , 

Educational Resources Information Center (ERIC) 

REPRODUCTION RELEASE 

(Specific Document) 

I. DOCUMENT IDENTIFICATION: 

Title: 

^2l TtfL — zk 

Author(s): ^Jdj. c A/ak^i MM/ tj 



Corporate Source: ^ <z//r en Of di 

(ZjltLlh. 



Publication Date: 



II. REPRODUCTION RELEASE: 

In order to disseminate as widely as possible timely and significant materials of interest to the educational community, documents announced in the 
monthly abstract journal of the ERIC system, Resources in Education (RIE), are usually made available to users in microfiche, reproduced paper copy, 
and electronic media, and sold through the ERIC Document Reproduction Service (EDRS). Credit is given to the source of each document, and, if 
reproduction release is granted, one of the following notices is affixed to the document. 

If permission is granted to reproduce and disseminate the identified document, please CHECK ONE of the following three options and sign at the bottom 
of the page. 



Tho aample sticker shown below will be 
affixed to all Level i documents 



Level 1 



O' 



The sample sticker shown below will be 
affixed to all Level 2A documents 



The sample sticker shown below will be 
affixed to all Level 2B documents 



Level 2A 

t 







PERMISSION TO REPRODUCE AND 


■ 




PERMISSION TO REPRODUCE AND 


; r 


DISSEMINATE THIS MATERIAL IN 




PERMISSION TO REPRODUCE AND 


DISSEMINATE THIS MATERIAL HAS 




MICROFICHE, AND IN ELECTRONIC MEDIA 




DISSEMINATE THIS MATERIAL IN 


BEEN GRANTED BY 




FOR ERIC COLLECTION SUBSCRIBERS ONLY, 




MICROFICHE ONLY HAS BEEN GRANTED BY 






HAS BEEN GRANTED BY 






r <f 


■ 


0 \® 


■ 






1 


cfP 


H 


Jr 

c2> 


TO THE EDUCATIONAL RESOURCES 




TO THE EDUCATIONAL RESOURCES 


S 


TO THE EDUCATIONAL RESOURCES 


INFORMATION CENTER (ERIC) 




INFORMATION CENTER (ERIC) 


1 


INFORMATION CENTER (ERIC) 




■ 


2A 


■ 


2B 



Level 2B 



t 

□ 



Check here for Level 1 release, permitting reproduction 
end dissemination in microfiche or other ERIC archival 
media (e.g., electronic) endpaper copy. 



Check here for Level 2A release, permitting reproduction 
and dissemination In microfiche and In electronic media 
for ERIC archival collection subscribers only 



Check here for Level 2B release, permitting 
reproduction and dissemination In microfiche only 



Documents will be processed as indicated provided reproduction quality permits. 

If permission to reproduce is granted, but no box is checked, documents will be processed at Level 1 . 



1 hereby grant to the Educational Resources Information Center (ERIC) nonexclusive permission to reproduce and disseminate this document 
as indicated above. Reproduction from the ERIC microfiche or electronic media by persons other than ERIC employees and its system 
contractors requires permission from the copyright holder. Exception is made for non-profit reproduction by libraries end other service agencies 
to satisfy Information needs of educators in response to discrete inquiries. 

- — ^ 




Printed Nama/PoBitkxVTttle: ~ r 

YU ,Y /l/ak cOHUhtt. / 


OfBsnaa^idreM. (■( /l l' ift'S /-/is 

rj\<^/n/-cAo fcopuJ?h*y ( '-5/u 


Telephone? _ . _ 


FA £ 




09,8 */. 



Sign 

here,-* 

please 



(over) 












III. DOCUMENT AVAILABILITY INFORMATION (FROM NON-ERIC SOURCE): 

„ * * . V- • ' ’ 

If permission to reproduce is not granted to ERIC, or, if you wish ERIC to cite the availability of the document from another source; please 
provide the following information regarding the availability of the document. (ERIC will not announce a document unless rt is. publicly 
available, and a dependable source can be specified. Contributors should also be aware that ERIC selection oiteria are significantly more 
stringent for documents that cannot be made available through EDRS.) 




V. WHERE TO SEND THIS FORM: 



Send this form to the following ERIC Clearinghouse: 

ERIC Clearinghouse on 
Languages & Linguistics 
4646 40TH ST. NW 
WASHINGTON, D.C. 20016-1859 



However, if solicited by the ERIC Facility, or if making an unsolicited contribution to ERIC, return this form (and the document being 
contributed) to: 

ERIC Processing and Reference FadJfiT 

' 1100 West Street, 2 " Floor^ 5 '^ 

Laurel; Maryland 207074598 

Telephone: 30*497-4080 
Toll Free:- 800-799-3742 
FAX: 301-953-0263 
e-maii: erlcfac@lneted.gov ^ 

WWW: http://ericfac.plccard.csc.com 

EFF-088 (Rev. 9/97) 

PREVIOUS VERSIONS OF THIS FORM ARE OBSOLETE. 










